CNET's AI Journalist Appears to Have Committed Extensive Plagiarism

CNET's AI-generated articles appear to show deep structural similarities, amounting to plagiarism, with previously published work elsewhere. — Robot in the rain *Image: Getty Images*

The site initially addressed widespread backlash to the bot-written articles by assuring readers that a human editor was carefully fact-checking them all prior to publication.

Afterward, though, Futurism found that a substantial number of errors had been slipping into the AI’s published work. CNET, a titan of tech journalism that sold for $1.8 billion back in 2008, responded by issuing a formidable correction and slapping a warning on all the bot’s prior work, alerting readers that the posts’ content was under factual review. Days later, its parent company Red Ventures announced in a series of internal meetings that it was temporarily pausing the AI-generated articles at CNET and various other properties including Bankrate, at least until the storm of negative press died down.

Now, a fresh development may make efforts to spin the program back up even more controversial for the embattled newsroom. In addition to those factual errors, a new Futurism investigation found extensive evidence that the CNET AI’s work has demonstrated deep structural and phrasing similarities to articles previously published elsewhere, without giving credit. In other words, it looks like the bot directly plagiarized the work of Red Ventures competitors, as well as human writers at Bankrate and even CNET itself.

Jeff Schatten, a professor at Washington and Lee University who has been examining the rise of AI-enabled misconduct, reviewed numerous examples of the bot’s apparent cribbing that we provided. He found that they “clearly” rose to the level of plagiarism.

We asked Schatten what would happen if a student turned in an essay with a comparable number of similarities to existing documents with no attribution.

“They would be sent to the student-run ethics council and given the repeated nature of the behavior would almost certainly be expelled from the university,” he replied.

The bot’s misbehavior ranges from verbatim copying to moderate edits to significant rephrasings, all without properly crediting the original. In at least some of its articles, it appears that virtually every sentence maps directly onto something previously published elsewhere.

Take this excerpt, for instance, from a recent article by the CNET AI about overdraft protection:

How to avoid overdraft and NSF fees

Overdraft fees and NSF fees don’t have to be a common consequence. There are a few steps you can take to avoid them.

And compare it to this verbiage from a previously published article in Forbes Advisor, a Red Ventures competitor:

How to Avoid Overdraft and NSF Fees

Overdraft and NSF fees need not be the norm. There are several tools at your disposal to avoid them.

Sure, the bot’s version altered the capitalization and swapped out a few words for impressively lateral-minded synonyms — “the norm” becomes “a common consequence,” for instance, and “several tools” becomes “a few steps” — along with a few minor changes to the syntax. But apart from those semantic tweaks, the two sentences are nearly identical.

Here’s another excerpt from the same article by CNET‘s AI financial writer:

You may be able to receive low balance alerts from your bank’s mobile app, so you know if your account balance is dropping below a certain threshold.

Now compare it to this section from another previously published article, this one from The Balance, another Red Ventures competitor:

You can sign up for low-balance alerts through most banks to alert you when your account hits a certain amount.

Again, it seems clear that the AI is simply parsing through and making small modifications to obscure the source.

Sometimes the similarities are almost comical in their lack of subtlety. Take the first sentence of this article, also published by CNET‘s AI:

Gift cards are an easy go-to when buying a present for someone.

And compare it to the first sentence of this previously published Forbes article:

Gift cards are an easy-to-please present for just about anyone.

The kicker on that one? Check out the almost imperceptible difference between those two articles’ headlines. Here’s the CNET AI’s title:

Can You Buy a Gift Card With a Credit Card?

And here’s what Forbes ran with for a headline:

Can You Buy Gift Cards With a Credit Card?

That’s right: the only difference is switching “Gift Cards” to a singular.

Here’s another example, from the same AI-generated CNET article about overdraft fees:

What is overdraft protection?

Overdraft protection is an optional feature offered by banks to prevent the rejection of a charge on a checking account with insufficient funds.

Which, it turns out, appears to be a word salad rephrasing of a line from this article on Investopedia, another Red Ventures competitor.

What Is Overdraft Protection?

Overdraft protection is an optional service that prevents the rejection of charges to a bank account… that are in excess of the available funds in the account.

The AI appears to sometimes also borrow language from writers at CNET‘s sister site Bankrate without giving credit. For example, look at this line from an article published by CNET‘s AI back in November:

Becoming an authorized user can help you avoid applying for a card on your own, which is a major benefit if you currently have bad credit or no credit history.

And compare it to this wording, previously published by a Bankrate writer:

Becoming an authorized user also lets you avoid having to apply for a card on your own, which is a major benefit if you currently have bad credit or no credit history at all.

All told, a pattern quickly emerges. Essentially, CNET‘s AI seems to approach a topic by examining similar articles that have already been published and ripping sentences out of them. As it goes, it makes adjustments — sometimes minor, sometimes major — to the original sentence’s syntax, word choice, and structure. Sometimes it mashes two sentences together, or breaks one apart, or assembles chunks into new Frankensentences. Then it seems to repeat the process until it’s cooked up an entire article.

A current Red Ventures employee also reviewed examples of the bot’s seemingly lifted work.

“You ever copy your homework off of somebody,” they quipped, “but they told you to kind of rephrase it?”

“It poses the question of what kind of institutions do CNET and Bankrate want to be seen as,” they continued. “They’re just taking these articles and rephrasing a couple of things.”

Are you a current or former Red Ventures employee and want to share your thoughts about the company’s use of AI? Email us at tips@futurism.com. We can keep you anonymous.

In short, a close examination of the work produced by CNET‘s AI makes it seem less like a sophisticated text generator and more like an automated plagiarism machine, casually pumping out pilfered work that would get a human journalist fired.

Perhaps, at the end of the day, none of this should be terribly surprising. At their core, the way that machine learning systems work is that you feed in an immense pile of “training data,” process it with sophisticated algorithms, and end up with a model that can produce similar work on demand.

Investigators have sometimes found examples of AI plagiarizing its own training data. In 2021, for instance, researchers from Johns Hopkins University, New York University and Microsoft found that text-generating AIs “sometimes copy substantially, in some cases duplicating passages over 1,000 words long from the training set.”

As such, the question of exactly how CNET’s disastrous AI was trained may end up taking center stage as the drama continues to unfold. At a CNET company meeting late last week, The Verge reported at the time, the outlet’s executive vice president of content and audience refused to tell staff — many of them acclaimed tech journalists who have written extensively about the rise of machine learning — what data had been used to train the AI.

The legality of using data to train an AI without the consent of the people who created that data is currently being tested by several lawsuits against the makers of prominent image generators, and could become a flashpoint in the commercialization of the tech.

“If a student presented the equivalent of what CNET has produced for an assignment in my class, and if they did not cite their sources, then I would definitely count it as plagiarism,” said Antony Aumann, a philosophy professor at Northern Michigan University who recently made headlines when he discovered that one of his own students had submitted an essay generated using ChatGPT, after reviewing examples of the CNET AI’s similar phrasing to other outlets.

“Now, there is some dispute among academics about exactly what plagiarism is,” he continued. “Some scholars consider it a form of stealing; other scholars regard it as a kind of lying. I think of it in the latter way. Plagiarism involves representing something as your own that is in fact not your own. And that appears to be what CNET is doing.”

CNET did not respond to examples of the bot’s seemingly cribbed writing, nor to questions about this story.

In a sense, the relentless ineptitude of the company’s braindead AI probably obfuscates many of the thornier themes we’re likely to see emerge as the tech continues to spread into the workplace and information ecosystems.

Schatten, for instance, warned that issues around AI and intellectual property are likely to get more ambiguous and difficult to detect as AI systems continue to improve, or even as publishers start to experiment with more advanced systems that already exist (Red Ventures has declined to say what AI it’s using, though the editor-in-chief of CNET has said that it’s not ChatGPT.)

“The CNET example is noteworthy because whatever AI they were using was not drawing from the entirety of the internet and carefully coming up with a new mosaic, but rather just lifting more or less word for word from existing stories,” Schatten said. “But the more sophisticated AIs of today, and certainly the AIs of the future, will do a better job of hiding the origins of the material.”

“And especially once AIs are drawing from the writing of other AIs, which themselves are quoting AI (dark, I know) it might become quite difficult to detect,” he added.

In a practical sense, it seems increasingly obvious that CNET and Red Ventures deployed the AI system and started blasting its articles out to the site’s colossal audience without ever really scrutinizing its output. It wasn’t just that the architects of the program missed obvious factual errors, but that they appear never to have checked whether the system’s work might have been poached.

And to be fair, why would they? As The Verge reported in a fascinating deep dive last week, the company’s primary strategy is to post massive quantities of content, carefully engineered to rank highly in Google, and loaded with lucrative affiliate links.

For Red Ventures, The Verge found, those priorities have transformed the once-venerable CNET into an “AI-powered SEO money machine.”